We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decodert takes as input two types of queries: (i) generic non-semantic queries and (ii) semantic queries induced from text inputs, to decode different pixel-level and token-level outputs in the same semantic space. With such a novel design, X-Decoder is the first work that provides a unified way to support all types of image segmentation and a variety of vision-language (VL) tasks. Further, our design enables seamless interactions across tasks at different granularities and brings mutual benefits by learning a common and rich pixel-level visual-semantic understanding space, without any pseudo-labeling. After pretraining on a mixed set of a limited amount of segmentation data and millions of image-text pairs, X-Decoder exhibits strong transferability to a wide range of downstream tasks in both zero-shot and finetuning settings. Notably, it achieves (1) state-of-the-art results on open-vocabulary segmentation and referring segmentation on eight datasets; (2) better or competitive finetuned performance to other generalist and specialist models on segmentation and VL tasks; and (3) flexibility for efficient finetuning and novel task composition (e.g., referring captioning and image editing). Code, demo, video, and visualization are available at https://x-decoder-vl.github.io.
translated by 谷歌翻译
The geographically weighted regression (GWR) is an essential tool for estimating the spatial variation of relationships between dependent and independent variables in geographical contexts. However, GWR suffers from the problem that classical linear regressions, which compose the GWR model, are more prone to be underfitting, especially for significant volume and complex nonlinear data, causing inferior comparative performance. Nevertheless, some advanced models, such as the decision tree and the support vector machine, can learn features from complex data more effectively while they cannot provide explainable quantification for the spatial variation of localized relationships. To address the above issues, we propose a geographically gradient boosting weighted regression model, GWRBoost, that applies the localized additive model and gradient boosting optimization method to alleviate underfitting problems and retains explainable quantification capability for spatially-varying relationships between geographically located variables. Furthermore, we formulate the computation method of the Akaike information score for the proposed model to conduct the comparative analysis with the classic GWR algorithm. Simulation experiments and the empirical case study are applied to prove the efficient performance and practical value of GWRBoost. The results show that our proposed model can reduce the RMSE by 18.3\% in parameter estimation accuracy and AICc by 67.3\% in the goodness of fit.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
机器学习(ML)是一种在车辆互联网(IOV)上培训预测模型的分布式方法,以实现智能公共交通。由于交通状况会随着时间而变化,因此必须连续有效地更新流量流动和乘客等待时间的ML模型。联合学习(FL)是一种分布式机器学习方案,允许车辆接收连续的模型更新,而无需将原始数据上传到云中并等待培训模型。但是,由于车辆在公共场所旅行以来,智能公共交通中FL容易受到中毒或DDOS攻击的影响。此外,由于设备异质性和不平衡数据分布,同步聚合策略在聚集之前从特定车辆中收集本地模型的同步聚合策略效率低下。尽管有异步联合学习(AFL)方案是通过收到本地模型来提高效率的,但陈旧的本地模型仍然不合理地加权,导致学习绩效不佳。为了实现更明智的公共交通,本文提供了一个基于动态缩放系数(DBAFL)的基于区块链的异步联合学习方案。具体而言,基于委员会的新型共识算法用于区块链,以最低的时间成本提高了可靠性。同时,设计的动态缩放系数允许AFL为陈旧的本地模型分配合理的重量。在异质设备上进行的广泛实验验证了DBAFL的学习效果,效率和可靠性优于外观的实验。
translated by 谷歌翻译
时空活动预测,旨在预测特定位置和时间的用户活动,对于城市规划和移动广告等应用至关重要。基于张量分解或嵌入图的现有解决方案受到以下两个主要局限性的影响:1)忽略用户偏好的细粒度相似之处; 2)用户的建模是纠缠的。在这项工作中,我们提出了一个称为Disenhcn的超图神经网络模型,以弥合上述差距。特别是,我们首先将细粒的用户相似性和用户偏好和时空活动之间的复杂匹配统一为异质性超图。然后,我们将用户表示形式分为不同的方面(位置感知,时光和活动意识),并汇总相应的方面在构造的超图上的特征,从不同方面捕获了高阶关系,并解散了最终方面的最终影响。预言。广泛的实验表明,我们的DisenHCN在四个现实世界中的数据集上优于最新方法的最新方法14.23%至18.10%。进一步的研究还令人信服地验证了我们disenhcn中每个组件的合理性。
translated by 谷歌翻译
从自然语言监督中学习视觉表示,最近在许多开创性的作品中表现出了巨大的希望。通常,这些具有语言的视觉模型表现出对各种数据集和任务的强大可传递性。但是,由于缺乏易于使用的评估工具包和公共基准,评估这些模型的可转让性仍然很具有挑战性。为了解决这个问题,我们构建了高级版(评估语言的视觉任务级传输),这是用于评估(预训练)语言增强视觉模型的第一个基准和工具包。升华由三个组成部分组成。 (i)数据集。作为下游评估套件,它由20个图像分类数据集和35个对象检测数据集组成,每个数据集都用外部知识来增强。 (ii)工具包。开发了自动高参数调谐工具包,以促进下游任务的模型评估。 (iii)指标。多种评估指标用于测量样品效率(零射击和少量)和参数效率(线性探测和完整模型微调)。我们在https://computer-vision-in-the-wild.github.io/elevater/上公开发布leverater
translated by 谷歌翻译
损失函数在培训基于网络的对象探测器方面发挥着重要作用。对象检测的最广泛使用的评估度量是平均精度(AP),其同时捕获本地化和分类子任务的性能。然而,由于AP度量标准的非可分性性质,传统的对象探测器采用两个子任务采用单独的可分散损耗。这种错误对齐问题可能会导致性能下降。为了解决这个问题,现有的作品寻求手动设计AP公制的代理损失,这需要专业知识,并且可能仍可能是次优。在本文中,我们提出了参数化的AP损耗,其中引入参数化功能以替换AP计算中的非微弱组件。因此,不同的AP近似由统一公式中的参数化函数系列表示。然后采用自动参数搜索算法来搜索最佳参数。具有三种不同对象探测器的CoCo基准的广泛实验(即,RetinAnet,更快的R-CNN和可变形DETR)表明,所提出的参数化AP损耗始终如一地优于现有的手工损失。代码在https://github.com/fundamentalvision/parameterized-ap-loss发布。
translated by 谷歌翻译
计算机程序的源代码的有效和有效的编码对于计算机程序理解中的任务的顺序对深神经网络模型的成功至关重要,例如自动化代码摘要和文档。一项重大挑战是找到一个顺序表示,该表示可以在计算机程序中捕获结构/句法信息,并促进学习模型的培训。在本文中,我们建议使用计算机程序的PR \“UFER序列(AST)的计算机程序的PR \”UFER序列(AST)来设计一个保持AST中的结构信息的顺序表示方案。我们的代表可以发展深层 - 学习模型,其中训练示例中的词汇标记携带的信号可以根据其句法角色和重要性自动且选择性地利用。与其他最近建议的方法不同,我们的代表在AST的结构信息方面简洁无损。现实世界基准数据集的实证研究,使用我们设计用于代码摘要的序列序列学习模型,表明我们的PR \“基于UFER序列的表示确实具有高效和高效,优先于最近建议的我们用作基线模型的深度学习模型。
translated by 谷歌翻译
近年来,基于深度卷积神经网络(CNN)的细分方法已为许多医学分析任务做出了最先进的成就。但是,这些方法中的大多数通过优化结构或添加U-NET的新功能模块来改善性能,从而忽略了粗粒和细粒的语义信息的互补和融合。为了解决上述问题,我们提出了一个称为渐进学习网络​​(PL-NET)的医学图像分割框架,其中包括内部渐进式学习(IPL)和外部渐进学习(EPL)。 PL-NET具有以下优点:(1)IPL将特征提取为两个“步骤”,它们可以混合不同尺寸的接收场并捕获从粗粒度到细粒度的语义信息,而无需引入其他参数; (2)EPL将训练过程分为两个“阶段”以优化参数,并在上一阶段中实现粗粒信息的融合,并在后期阶段进行细粒度。我们在不同的医学图像分析任务中评估了我们的方法,结果表明,PL-NET的分割性能优于U-NET及其变体的最新方法。
translated by 谷歌翻译